Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
1.
Comput Biol Med ; 174: 108399, 2024 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-38615461

RESUMO

Glaucoma is one of the leading cause of blindness worldwide. Individuals affected by glaucoma, including patients and their family members, frequently encounter a deficit in dependable support beyond the confines of clinical environments. Seeking advice via the internet can be a difficult task due to the vast amount of disorganized and unstructured material available on these sites, nevertheless. This research explores how Large Language Models (LLMs) can be leveraged to better serve medical research and benefit glaucoma patients. We introduce Xiaoqing, a Natural Language Processing (NLP) model specifically tailored for the glaucoma field, detailing its development and deployment. To evaluate its effectiveness, we conducted two forms of experiments: comparative and experiential. In the comparative analysis, we presented 22 glaucoma-related questions in simplified Chinese to three medical NLP models (Xiaoqing LLMs, HuaTuo, Ivy GPT) and two general models (ChatGPT-3.5 and ChatGPT-4), covering a range of topics from basic glaucoma knowledge to treatment, surgery, research, management standards, and patient lifestyle. Responses were assessed for informativeness and readability. The experiential experiment involved glaucoma patients and non-patients interacting with Xiaoqing, collecting and analyzing their questions and feedback on the same criteria. The findings demonstrated that Xiaoqing notably outperformed the other models in terms of informativeness and readability, suggesting that Xiaoqing is a significant advancement in the management and treatment of glaucoma in China. We also provide a Web-based version of Xiaoqing, allowing readers to directly experience its functionality. The Web-based Xiaoqing is available at https://qa.glaucoma-assistant.com//qa.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38625773

RESUMO

Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by comparing our model generalization capabilities on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.

3.
Comput Biol Med ; 174: 108431, 2024 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-38626507

RESUMO

Skin wrinkles result from intrinsic aging processes and extrinsic influences, including prolonged exposure to ultraviolet radiation and tobacco smoking. Hence, the identification of wrinkles holds significant importance in skin aging and medical aesthetic investigation. Nevertheless, current methods lack the comprehensiveness to identify facial wrinkles, particularly those that may appear insignificant. Furthermore, the current assessment techniques neglect to consider the blurred boundary of wrinkles and cannot differentiate images with varying resolutions. This research introduces a novel wrinkle detection algorithm and a distance-based loss function to identify full-face wrinkles. Furthermore, we develop a wrinkle detection evaluation metric that assesses outcomes based on curve, location, and gradient similarity. We collected and annotated a dataset for wrinkle detection consisting of 1021 images of Chinese faces. The dataset will be made publicly available to further promote wrinkle detection research. The research demonstrates a substantial enhancement in detecting subtle wrinkles through implementing the proposed method. Furthermore, the suggested evaluation procedure effectively considers the indistinct boundaries of wrinkles and is applicable to images with various resolutions.

4.
Sensors (Basel) ; 24(6)2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38544251

RESUMO

Restricted mouth opening (trismus) is one of the most common complications following head and neck cancer treatment. Early initiation of mouth-opening exercises is crucial for preventing or minimizing trismus. Current methods for these exercises predominantly involve finger exercises and traditional mouth-opening training devices. Our research group successfully designed an intelligent mouth-opening training device (IMOTD) that addresses the limitations of traditional home training methods, including the inability to quantify mouth-opening exercises, a lack of guided training resulting in temporomandibular joint injuries, and poor training continuity leading to poor training effect. For this device, an interactive remote guidance mode is introduced to address these concerns. The device was designed with a focus on the safety and effectiveness of medical devices. The accuracy of the training data was verified through piezoelectric sensor calibration. Through mechanical analysis, the stress points of the structure were identified, and finite element analysis of the connecting rod and the occlusal plate connection structure was conducted to ensure the safety of the device. The findings support the effectiveness of the intelligent device in rehabilitation through preclinical experiments when compared with conventional mouth-opening training methods. This intelligent device facilitates the quantification and visualization of mouth-opening training indicators, ensuring both the comfort and safety of the training process. Additionally, it enables remote supervision and guidance for patient training, thereby enhancing patient compliance and ultimately ensuring the effectiveness of mouth-opening exercises.


Assuntos
Neoplasias de Cabeça e Pescoço , Trismo , Humanos , Trismo/etiologia , Trismo/reabilitação , Terapia por Exercício/métodos , Exercício Físico , Boca
5.
IEEE Trans Image Process ; 33: 1898-1910, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38451761

RESUMO

In this paper, we present a simple yet effective continual learning method for blind image quality assessment (BIQA) with improved quality prediction accuracy, plasticity-stability trade-off, and task-order/-length robustness. The key step in our approach is to freeze all convolution filters of a pre-trained deep neural network (DNN) for an explicit promise of stability, and learn task-specific normalization parameters for plasticity. We assign each new IQA dataset (i.e., task) a prediction head, and load the corresponding normalization parameters to produce a quality score. The final quality estimate is computed by a weighted summation of predictions from all heads with a lightweight K -means gating mechanism. Extensive experiments on six IQA datasets demonstrate the advantages of the proposed method in comparison to previous training techniques for BIQA.

6.
Comput Biol Med ; 170: 108067, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38301513

RESUMO

BACKGROUND: Ocular Adnexal Lymphoma (OAL) is a non-Hodgkin's lymphoma that most often appears in the tissues near the eye, and radiotherapy is the currently preferred treatment. There has been a controversy regarding the prognostic factors for systemic failure of OAL radiotherapy, the thorough evaluation prior to receiving radiotherapy is highly recommended to better the patient's prognosis and minimize the likelihood of any adverse effects. PURPOSE: To investigate the risk factors that contribute to incomplete remission in OAL radiotherapy and to establish a hybrid model for predicting the radiotherapy outcomes in OAL patients. METHODS: A retrospective chart review was performed for 87 consecutive patients with OAL who received radiotherapy between Feb 2011 and August 2022 in our center. Seven image features, derived from MRI sequences, were integrated with 122 clinical features to form comprehensive patient feature sets. Chemometric algorithms were then employed to distill highly informative features from these sets. Based on these refined features, SVM and XGBoost classifiers were performed to classify the effect of radiotherapy. RESULTS: The clinical records of from 87 OAL patients (median age: 60 months, IQR: 52-68 months; 62.1% male) treated with radiotherapy were reviewed. Analysis of Lasso (AUC = 0.75, 95% CI: 0.72-0.77) and Random Forest (AUC = 0.67, 95% CI: 0.62-0.70) algorithms revealed four potential features, resulting in an intersection AUC of 0.80 (95% CI: 0.75-0.82). Logistic Regression (AUC = 0.75, 95% CI: 0.72-0.77) identified two features. Furthermore, the integration of chemometric methods such as CARS (AUC = 0.66, 95% CI: 0.62-0.72), UVE (AUC = 0.71, 95% CI: 0.66-0.75), and GA (AUC = 0.65, 95% CI: 0.60-0.69) highlighted six features in total, with an intersection AUC of 0.82 (95% CI: 0.78-0.83). These features included enophthalmos, diplopia, tenderness, elevated ALT count, HBsAg positivity, and CD43 positivity in immunohistochemical tests. CONCLUSION: The findings suggest the effectiveness of chemometric algorithms in pinpointing OAL risk factors, and the prediction model we proposed shows promise in helping clinicians identify OAL patients likely to achieve complete remission via radiotherapy. Notably, patients with a history of exophthalmos, diplopia, tenderness, elevated ALT levels, HBsAg positivity, and CD43 positivity are less likely to attain complete remission after radiotherapy. These insights offer more targeted management strategies for OAL patients. The developed model is accessible online at: https://lzz.testop.top/.


Assuntos
Neoplasias Oculares , Linfoma não Hodgkin , Humanos , Masculino , Pré-Escolar , Feminino , Estudos Retrospectivos , Quimiometria , Diplopia , Antígenos de Superfície da Hepatite B , Neoplasias Oculares/diagnóstico por imagem , Neoplasias Oculares/radioterapia , Linfoma não Hodgkin/diagnóstico por imagem , Linfoma não Hodgkin/radioterapia , Linfoma não Hodgkin/patologia , Algoritmos
7.
Artigo em Inglês | MEDLINE | ID: mdl-38376963

RESUMO

Video compression is indispensable to most video analysis systems. Despite saving the transportation bandwidth, it also deteriorates downstream video understanding tasks, especially at low-bitrate settings. To systematically investigate this problem, we first thoroughly review the previous methods, revealing that three principles, i.e., task-decoupled, label-free, and data-emerged semantic prior, are critical to a machine-friendly coding framework but are not fully satisfied so far. In this paper, we propose a traditional-neural mixed coding framework that simultaneously fulfills all these principles, by taking advantage of both traditional codecs and neural networks (NNs). On one hand, the traditional codecs can efficiently encode the pixel signal of videos but may distort the semantic information. On the other hand, highly non-linear NNs are proficient in condensing video semantics into a compact representation. The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved w.r.t. the coding procedure, which is spontaneously learned from unlabeled data in a self-supervised manner. The videos collaboratively decoded from two streams (codec and NN) are of rich semantics, as well as visually photo-realistic, empirically boosting several mainstream downstream video analysis task performances without any post-adaptation procedure. Furthermore, by introducing the attention mechanism and adaptive modeling scheme, the video semantic modeling ability of our approach is further enhanced. Fianlly, we build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach. All codes, data, and models will be open-sourced for facilitating future research.

8.
Comput Biol Med ; 171: 108212, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38422967

RESUMO

BACKGROUND: Deep learning-based super-resolution (SR) algorithms aim to reconstruct low-resolution (LR) images into high-fidelity high-resolution (HR) images by learning the low- and high-frequency information. Experts' diagnostic requirements are fulfilled in medical application scenarios through the high-quality reconstruction of LR digital medical images. PURPOSE: Medical image SR algorithms should satisfy the requirements of arbitrary resolution and high efficiency in applications. However, there is currently no relevant study available. Several SR research on natural images have accomplished the reconstruction of resolutions without limitations. However, these methodologies provide challenges in meeting medical applications due to the large scale of the model, which significantly limits efficiency. Hence, we suggest a highly effective method for reconstructing medical images at any desired resolution. METHODS: Statistical features of medical images exhibit greater continuity in the region of neighboring pixels than natural images. Hence, the process of reconstructing medical images is comparatively less challenging. Utilizing this property, we develop a neighborhood evaluator to represent the continuity of the neighborhood while controlling the network's depth. RESULTS: The suggested method has superior performance across seven scales of reconstruction, as evidenced by experiments conducted on panoramic radiographs and two external public datasets. Furthermore, the proposed network significantly decreases the parameter count by over 20× and the computational workload by over 10× compared to prior researches. On large-scale reconstruction, the inference speed can be enhanced by over 5×. CONCLUSION: The novel proposed SR strategy for medical images performs efficient reconstruction at arbitrary resolution, marking a significant breakthrough in the field. The given scheme facilitates the implementation of SR in mobile medical platforms.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos
9.
Phenomics ; 3(5): 469-484, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37881321

RESUMO

Thyroid cancer, a common endocrine malignancy, is one of the leading death causes among endocrine tumors. The diagnosis of pathological section analysis suffers from diagnostic delay and cumbersome operating procedures. Therefore, we intend to construct the models based on spectral data that can be potentially used for rapid intraoperative papillary thyroid carcinoma (PTC) diagnosis and characterize PTC characteristics. To alleviate any concerns pathologists may have about using the model, we conducted an analysis of the used bands that can be interpreted pathologically. A spectra acquisition system was first built to acquire spectra of pathological section images from 91 patients. The obtained spectral dataset contains 217 spectra of normal thyroid tissue and 217 spectra of PTC tissue. Clinical data of the corresponding patients were collected for subsequent model interpretability analysis. The experiment has been approved by the Ethics Review Committee of the Wuhu Hospital of East China Normal University. The spectral preprocessing method was used to process the spectra, and the preprocessed signal respectively optimized by the first and secondary informative wavelengths selection was used to develop the PTC detection models. The PTC detection model using mean centering (MC) and multiple scattering correction (MSC) has optimal performance, and the reasons for the good performance were analyzed in combination with the spectral acquisition process and composition of the test slide. For model interpretable analysis, the near-ultraviolet band selected for modeling corresponds to the location of amino acid absorption peak, and this is consistent with the clinical phenomenon of significantly lower amino acid concentrations in PTC patients. Moreover, the absorption peak of hemoglobin selected for modeling is consistent with the low hemoglobin index in PTC patients. In addition, the correlation analysis was performed between the selected wavelengths and the clinical data, and the results show: the reflection intensity of selected wavelengths in normal cells has a moderate correlation with cell arrangement structure, nucleus size and free thyroxine (FT4), and has a strong correlation with triiodothyronine (T3); the reflection intensity of selected bands in PTC cells has a moderate correlation with free triiodothyronine (FT3).

10.
Comput Biol Med ; 165: 107344, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37603961

RESUMO

Medical record images in EHR system are users' privacy and an asset, and there is an urgent need to protect this data. Image steganography can offer a potential solution. A steganographic model for medical record images is therefore developed based on StegaStamp. In contrast to natural images, medical record images are document images, which can be very vulnerable to image cropping attacks. Therefore, we use text region segmentation and watermark region localization to combat the image cropping attack. The distortion network has been designed to take into account the distortion that can occur during the transmission of medical record images, making the model robust against communication induced distortions. In addition, based on StegaStamp, we innovatively introduced FISM as part of the loss function to reduce the ripple texture in the steganographic image. The experimental results show that the designed distortion network and the FISM loss function term can be well suited for the steganographic task of medical record images from the perspective of decoding accuracy and image quality.


Assuntos
Confidencialidade , Registros Médicos , Informática Médica
11.
Front Neurosci ; 17: 1187619, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37456990

RESUMO

Aim: The aim of this study is to evaluate the utility of binocular chromatic pupillometry in detecting impaired pupillary light response (PLR) in patients with primary open-angle glaucoma (POAG) and to assess the feasibility of using binocular chromatic pupillometer in opportunistic POAG diagnosis in community-based or telemedicine-based services. Methods: In this prospective, cross-sectional study, 74 patients with POAG and 23 healthy controls were enrolled. All participants underwent comprehensive ophthalmologic examinations including optical coherence tomography (OCT) and standard automated perimetry (SAP). The PLR tests included sequential tests of full-field chromatic stimuli weighted by rods, intrinsically photosensitive retinal ganglion cells (ipRGCs), and cones (Experiment 1), as well as alternating chromatic light flash-induced relative afferent pupillary defect (RAPD) test (Experiment 2). In Experiment 1, the constricting amplitude, velocity, and time to maximum constriction/dilation were calculated in three cell type-weighted responses, and the post-illumination response of ipRGC-weighted response was evaluated. In Experiment 2, infrared pupillary asymmetry (IPA) amplitude and anisocoria duration induced by intermittent blue or red light flashes were calculated. Results: In Experiment 1, the PLR of POAG patients was significantly reduced in all conditions, reflecting the defect in photoreception through rods, cones, and ipRGCs. The variable with the highest area under the receiver operating characteristic curve (AUC) was time to max dilation under ipRGC-weighted stimulus, followed by the constriction amplitude under cone-weighted stimulus and the constriction amplitude response to ipRGC-weighted stimuli. The impaired PLR features were associated with greater visual field loss, thinner retinal nerve fiber layer (RNFL) thickness, and cupping of the optic disk. In Experiment 2, IPA and anisocoria duration induced by intermittent blue or red light flashes were significantly greater in participants with POAG than in controls. IPA and anisocoria duration had good diagnostic value, correlating with the inter-eye asymmetry of visual field loss. Conclusion: We demonstrate that binocular chromatic pupillometry could potentially serve as an objective clinical tool for opportunistic glaucoma diagnosis in community-based or telemedicine-based services. Binocular chromatic pupillometry allows an accurate, objective, and rapid assessment of retinal structural impairment and functional loss in glaucomatous eyes of different severity levels.

12.
Sensors (Basel) ; 23(10)2023 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-37430638

RESUMO

New CMOS imaging sensor (CIS) techniques in smartphones have helped user-generated content dominate our lives over traditional DSLRs. However, tiny sensor sizes and fixed focal lengths also lead to more grainy details, especially for zoom photos. Moreover, multi-frame stacking and post-sharpening algorithms would produce zigzag textures and over-sharpened appearances, for which traditional image-quality metrics may over-estimate. To solve this problem, a real-world zoom photo database is first constructed in this paper, which includes 900 tele-photos from 20 different mobile sensors and ISPs. Then we propose a novel no-reference zoom quality metric which incorporates the traditional estimation of sharpness and the concept of image naturalness. More specifically, for the measurement of image sharpness, we are the first to combine the total energy of the predicted gradient image with the entropy of the residual term under the framework of free-energy theory. To further compensate for the influence of over-sharpening effect and other artifacts, a set of model parameters of mean subtracted contrast normalized (MSCN) coefficients are utilized as the natural statistics representatives. Finally, these two measures are combined linearly. Experimental results on the zoom photo database demonstrate that our quality metric can achieve SROCC and PLCC over 0.91, while the performance of single sharpness or naturalness index is around 0.85. Moreover, compared with the best tested general-purpose and sharpness models, our zoom metric outperforms them by 0.072 and 0.064 in SROCC, respectively.

13.
IEEE Trans Image Process ; 32: 3847-3861, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37428674

RESUMO

In recent years, User Generated Content (UGC) has grown dramatically in video sharing applications. It is necessary for service-providers to use video quality assessment (VQA) to monitor and control users' Quality of Experience when watching UGC videos. However, most existing UGC VQA studies only focus on the visual distortions of videos, ignoring that the perceptual quality also depends on the accompanying audio signals. In this paper, we conduct a comprehensive study on UGC audio-visual quality assessment (AVQA) from both subjective and objective perspectives. Specially, we construct the first UGC AVQA database named SJTU-UAV database, which includes 520 in-the-wild UGC audio and video (A/V) sequences collected from the YFCC100m database. A subjective AVQA experiment is conducted on the database to obtain the mean opinion scores (MOSs) of the A/V sequences. To demonstrate the content diversity of the SJTU-UAV database, we give a detailed analysis of the SJTU-UAV database as well as other two synthetically-distorted AVQA databases and one authentically-distorted VQA database, from both the audio and video aspects. Then, to facilitate the development of AVQA fields, we construct a benchmark of AVQA models on the proposed SJTU-UAV database and other two AVQA databases, of which the benchmark models consist of AVQA models designed for synthetically distorted A/V sequences and AVQA models built through combining the popular VQA methods and audio features via support vector regressor (SVR). Finally, considering benchmark AVQA models perform poorly in assessing in-the-wild UGC videos, we further propose an effective AVQA model via jointly learning quality-aware audio and visual feature representations in the temporal domain, which is seldom investigated by existing AVQA models. Our proposed model outperforms the aforementioned benchmark AVQA models on the SJTU-UAV database and two synthetically distorted AVQA databases. The SJTU-UAV database and the code of the proposed model will be released to facilitate further research.


Assuntos
Aprendizagem , Bases de Dados Factuais , Gravação em Vídeo/métodos , Humanos
14.
Ophthalmol Ther ; 12(4): 2133-2156, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37284935

RESUMO

INTRODUCTION: This study aimed to examine the performance of binocular chromatic pupillometry for the objective and rapid detection of primary open-angle glaucoma (POAG), and to explore the association between pupillary light response (PLR) features and structural glaucomatous macular damage. METHODS: Forty-six patients (mean age 41.00 ± 13.03 years) with POAG and 23 healthy controls (mean age 42.00 ± 11.08 years) were enrolled. All participants underwent sequenced PLR tests of full-field, superior/inferior quadrant-field chromatic stimuli using a binocular head-mounted pupillometer. The constricting amplitude, velocity, and time to max constriction/dilation, and the post-illumination pupil response (PIPR) were analyzed. The inner retina thickness and volume measurements were determined by spectral domain optical coherence tomography. RESULTS: In the full-field stimulus experiment, time to pupil dilation was inversely correlated with perifoveal thickness (r = - 0.429, P < 0.001) and perifoveal volume (r = - 0.364, P < 0.001). Dilation time (AUC 0.833) showed good diagnostic performance, followed by the constriction amplitude (AUC 0.681) and PIPR (AUC 0.620). In the superior quadrant-field stimulus experiment, time of pupil dilation negatively correlated with inferior perifoveal thickness (r = - 0.451, P < 0.001) and inferior perifoveal volume (r = - 0.417, P < 0.001). The dilation time in response to the superior quadrant-field stimulus showed the best diagnostic performance (AUC 0.909). In the inferior quadrant-field stimulus experiment, time to pupil dilation (P < 0.001) correlated well with superior perifoveal thickness (r = - 0.299, P < 0.001) and superior perifoveal volume (r = - 0.304, P < 0.001). CONCLUSION: The use of chromatic pupillometry offers a patient-friendly and objective approach to detect POAG, while the impairment of PLR features may serve as a potential indicator of structural macular damage.

15.
IEEE Trans Med Imaging ; 42(11): 3295-3306, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37267133

RESUMO

The high-quality pathological microscopic images are essential for physicians or pathologists to make a correct diagnosis. Image quality assessment (IQA) can quantify the visual distortion degree of images and guide the imaging system to improve image quality, thus raising the quality of pathological microscopic images. Current IQA methods are not ideal for pathological microscopy images due to their specificity. In this paper, we present deep learning-based blind image quality assessment model with saliency block and patch block for pathological microscopic images. The saliency block and patch block can handle the local and global distortions, respectively. To better capture the area of interest of pathologists when viewing pathological images, the saliency block is fine-tuned by eye movement data of pathologists. The patch block can capture lots of global information strongly related to image quality via the interaction between different image patches from different positions. The performance of the developed model is validated by the home-made Pathological Microscopic Image Quality Database under Screen and Immersion Scenarios (PMIQD-SIS) and cross-validated by the five public datasets. The results of ablation experiments demonstrate the contribution of the added blocks. The dataset and the corresponding code are publicly available at: https://github.com/mikugyf/PMIQD-SIS.


Assuntos
Imersão , Microscopia , Bases de Dados Factuais
16.
Artigo em Inglês | MEDLINE | ID: mdl-37030730

RESUMO

With the popularity of mobile Internet, audio and video (A/V) have become the main way for people to entertain and socialize daily. However, in order to reduce the cost of media storage and transmission, A/V signals will be compressed by service providers before they are transmitted to end-users, which inevitably causes distortions in the A/V signals and degrades the end-user's Quality of Experience (QoE). This motivates us to research the objective audio-visual quality assessment (AVQA). In the field of AVQA, most previous works only focus on single-mode audio or visual signals, which ignores that the perceptual quality of users depends on both audio and video signals. Therefore, we propose an objective AVQA architecture for multi-mode signals based on attentional neural networks. Specifically, we first utilize an attention prediction model to extract the salient regions of video frames. Then, a pre-trained convolutional neural network is used to extract short-time features of the salient regions and the corresponding audio signals. Next, the short-time features are fed into Gated Recurrent Unit (GRU) networks to model the temporal relationship between adjacent frames. Finally, the fully connected layers are utilized to fuse the temporal related features of A/V signals modeled by the GRU network into the final quality score. The proposed architecture is flexible and can be applied to both full-reference and no-reference AVQA. Experimental results on the LIVE-SJTU Database and UnB-AVC Database demonstrate that our model outperforms the state-of-the-art AVQA methods. The code of the proposed method will be publicly available to promote the development of the field of AVQA.

17.
Artigo em Inglês | MEDLINE | ID: mdl-37022906

RESUMO

Video rescaling has recently drawn extensive attention for its practical applications such as video compression. Compared to video super-resolution, which focuses on upscaling bicubic-downscaled videos, video rescaling methods jointly optimize a downscaler and a upscaler. However, the inevitable loss of information during downscaling makes the upscaling procedure still ill-posed. Furthermore, the network architecture of previous methods mostly relies on convolution to aggregate information within local regions, which cannot effectively capture the relationship between distant locations. To address the above two issues, we propose a unified video rescaling framework by introducing the following designs. First, we propose to regularize the information of the downscaled videos via a contrastive learning framework, where, particularly, hard negative samples for learning are synthesized online. With this auxiliary contrastive learning objective, the downscaler tends to retain more information that benefits the upscaler. Second, we present a selective global aggregation module (SGAM) to efficiently capture long-range redundancy in high-resolution videos, where only a few representative locations are adaptively selected to participate in the computationally-heavy self-attention (SA) operations. SGAM enjoys the efficiency of the sparse modeling scheme while preserving the global modeling capability of SA. We refer to the proposed framework as Contrastive Learning framework with Selective Aggregation (CLSA) for video rescaling. Comprehensive experimental results show that CLSA outperforms video rescaling and rescaling-based video compression methods on five datasets, achieving state-of-the-art performance.

18.
IEEE Trans Neural Netw Learn Syst ; 34(11): 8566-8578, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35226610

RESUMO

Mesh is a type of data structure commonly used for 3-D shapes. Representation learning for 3-D meshes is essential in many computer vision and graphics applications. The recent success of convolutional neural networks (CNNs) for structured data (e.g., images) suggests the value of adapting insights from CNN for 3-D shapes. However, 3-D shape data are irregular since each node's neighbors are unordered. Various graph neural networks for 3-D shapes have been developed with isotropic filters or predefined local coordinate systems to overcome the node inconsistency on graphs. However, isotropic filters or predefined local coordinate systems limit the representation power. In this article, we propose a local structure-aware anisotropic convolutional operation (LSA-Conv) that learns adaptive weighting matrices for each template's node according to its neighboring structure and performs shared anisotropic filters. In fact, the learnable weighting matrix is similar to the attention matrix in the random synthesizer-a new Transformer model for natural language processing (NLP). Since the learnable weighting matrices require large amounts of parameters for high-resolution 3-D shapes, we introduce a matrix factorization technique to notably reduce the parameter size, denoted as LSA-small. Furthermore, a residual connection with a linear transformation is introduced to improve the performance of our LSA-Conv. Comprehensive experiments demonstrate that our model produces significant improvement in 3-D shape reconstruction compared to state-of-the-art methods.

19.
IEEE Trans Cybern ; 53(6): 3651-3664, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34847052

RESUMO

Existing no-reference (NR) image quality assessment (IQA) metrics are still not convincing for evaluating the quality of the camera-captured images. Toward tackling this issue, we, in this article, establish a novel NR quality metric for quantifying the quality of the camera-captured images reliably. Since the image quality is hierarchically perceived from the low-level preliminary visual perception to the high-level semantic comprehension in the human brain, in our proposed metric, we characterize the image quality by exploiting both the low-level image properties and the high-level semantics of the image. Specifically, we extract a series of low-level features to characterize the fundamental image properties, including the brightness, saturation, contrast, noiseness, sharpness, and naturalness, which are highly indicative of the camera-captured image quality. Correspondingly, the high-level features are designed to characterize the semantics of the image. The low-level and high-level perceptual features play complementary roles in measuring the image quality. To infer the image quality, we employ the support vector regression (SVR) to map all the informative features to a single quality score. Thorough tests conducted on two standard camera-captured image databases demonstrate the effectiveness of the proposed quality metric in assessing the image quality and its superiority over the state-of-the-art NR quality metrics. The source code of the proposed metric for camera-captured images is released at https://github.com/YT2015?tab=repositories.

20.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 2864-2878, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35635807

RESUMO

The explosive growth of image data facilitates the fast development of image processing and computer vision methods for emerging visual applications, meanwhile introducing novel distortions to processed images. This poses a grand challenge to existing blind image quality assessment (BIQA) models, which are weak at adapting to subpopulation shift. Recent work suggests training BIQA methods on the combination of all available human-rated IQA datasets. However, this type of approach is not scalable to a large number of datasets and is cumbersome to incorporate a newly created dataset as well. In this paper, we formulate continual learning for BIQA, where a model learns continually from a stream of IQA datasets, building on what was learned from previously seen data. We first identify five desiderata in the continual setting with three criteria to quantify the prediction accuracy, plasticity, and stability, respectively. We then propose a simple yet effective continual learning method for BIQA. Specifically, based on a shared backbone network, we add a prediction head for a new dataset and enforce a regularizer to allow all prediction heads to evolve with new data while being resistant to catastrophic forgetting of old data. We compute the overall quality score by a weighted summation of predictions from all heads. Extensive experiments demonstrate the promise of the proposed continual learning method in comparison to standard training techniques for BIQA, with and without experience replay. We made the code publicly available at https://github.com/zwx8981/BIQA_CL.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...